Site Reliability Engineer DevOps

Remote

Full Time

Experienced

SITE RELIABILITY ENGINEER

SUMMARY:

Since 2006 PEX has been on a steady march to build and evolve a solution that helps improve the way organizations operate in order to make them more efficient, more nimble, and more competitive.

PEX has evolved into a robust, secure SaaS solution with a deep suite of workforce spend management capabilities, advanced card controls, real-time visibility into card usage, and improved reconciliation processes. More importantly, we are providing a better, more effective solution for thousands of companies and hundreds of thousands of people in the workforce. We work each day to find new ways we can help our clients operate more efficiently.

Our environment is a mix of Windows and Linux machines that reside on-premise and in the cloud. It is crucial that all work is performed under strict adherence to PCI DSS requirements, and our environment is required to be available 24x7.

WHO YOU ARE:

As a Site Reliability Engineer, you will be responsible for planning, production, and engagement with software developers and infrastructure engineers to integrate software development and delivery.

WHAT YOU’LL DO:

● Architectural oversight and ownership of web delivery stack - from the server/service to the end-user.

● Continuous improvement of system and application monitoring and automation

● Ensuring sufficient monitoring of infrastructure, systems, and application availability, performance, and capacity

● Ensuring sufficient monitoring of the availability, latency, scalability, and efficiency of all services

● Promoting availability and stability in a 24/7 high-availability environment

● Participating in an on-call rotation

REQUIRED SKILLS & QUALIFICATION

● Strong experience with Linux and at least one programming language (e.g. Python, Go, Ruby)

● Experience with containerization and orchestration technologies such as Docker and Kubernetes

● Experience with cloud infrastructure (e.g. Azure, AWS, GCP) as well as Infrastructure-as-Code tooling (e.g. Terraform) and CI/CD practices.

● Familiarity with monitoring, tracing, and logging tools (e.g. Zabbix, SumoLogic), including concepts such as SLI/SLO and error budgets.

● Strong problem-solving skills and ability to troubleshoot complex issues

● Strong communication skills and ability to work well in a team

● Experience with incident management and incident response

● Strong understanding of networking protocols and concepts

● Understanding of security concepts and best practices

● Strong understanding of system performance metrics and how to interpret them

● Ability to operate individually and as part of a team.

Apply for this position

Required*

Apply with Indeed

First Name*

Last Name*

Email Address*

Phone*

Resume*

We've received your resume. Click here to update it.

Attach resume or Paste resume

Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

What's your citizenship / employment eligibility?*

Desired salary*

Human Check*

Submit Application